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While functional regression models have received increasing at¬ 
tention recently, most existing approaches assume both a linear re¬ 
lationship and a scalar response variable. We suggest a new method, 
“Functional Response Additive Model Estimation” (FRAME), which 
extends the usual linear regression model to situations involving both 
functional predictors, Xj{t), scalar predictors, Zk, and functional re¬ 
sponses, Y{s). Our approach uses a penalized least squares optimiza¬ 
tion criterion to automatically perform variable selection in situations 
involving multiple functional and scalar predictors. In addition, our 
method uses an efficient coordinate descent algorithm to fit general 
nonlinear additive relationships between the predictors and response. 

We develop our model for novel forecasting challenges in the en¬ 
tertainment industry. In particular, we set out to model the decay 
rate of demand for Hollywood movies using the predictive power of 
online virtual stock markets (VSMs). VSMs are online communities 
that, in a market-like fashion, gather the crowds’ prediction about de¬ 
mand for a particular product. Our fully functional model captures 
the pattern of pre-release VSM trading prices and provides superior 
predictive accuracy of a movie’s post-release demand in comparison 
to traditional methods. In addition, we propose graphical tools which 
give a glimpse into the causal relationship between market behavior 
and box office revenue patterns, and hence provide valuable insight 
to movie decision makers. 

1. Introduction. Functional data analysis (FDA) has become an impor¬ 
tant topic of study in recent years, in part because of its ability to cap¬ 
ture patterns and shapes in a parsimonious and automated fashion [Ram¬ 
say and Silverman (2005)]. Recent methodological advances in FDA include 


Received August 2013; revised July 2014. 

^Supported in part by NSF CAREER Award DMS-11-50318. 

Key words and phrases. Functional data, nonlinear regression, penalty functions, fore¬ 
casting, virtual markets, movies, Hollywood. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Applied Statistics, 

2014, Vol. 8, No. 4, 2435-2460. This reprint differs from the original in pagination 
and typographic detail. 


1 




2 


FAN, FOUTZ, JAMES AND JANK 


functional principal components analysis [James, Hastie and Sugar (2000), 
Rice and Wn (2001)], regression with functional responses [Zeger and Diggle 
(1994)1 or functional predictors [Ferraty and Vieu (2002), James and Sil¬ 
verman (2005)], functional linear discriminant analysis [James and Hastie 
(2001), Ferraty and Vieu (2003)], functional clustering [James and Sugar 
(2003), Bar-Joseph et al. (2003)] or functional forecasting [Zhang, Jank and 
Shmneli (2010)]. 

In this paper we are interested in the regression sitnation involving p 
different fnnctional predictors, Xi{t),... ,Xp{t). Most existing functional re¬ 
gression models assume a linear relationship between the response and pre¬ 
dictors [Yao, Muller and Wang (2005a)], which is often an overly restrictive 
assumption. Recently, several papers have suggested approaches for perform¬ 
ing nonlinear functional regressions [James and Silverman (2005), Chen, Hall 
and Muller (2011), Fan, James and Radchenko (2014)] of the form 

p 

(1) Yj = fj {Xjj) Ej, i = l,...,n, 

i=i 

where the /j’s are general nonlinear functions of Xij{t) and F) is a centered 
response. Generally speaking, these approaches operationalize estimation 
of equation (1) by using functional index models. While all of these ap¬ 
proaches provide a very flexible extension of the linear functional model, 
they are designed for scalar responses only. In this paper, we generalize this 
framework to functional responses. That is, we consider both functional pre¬ 
dictors Xij{t) and functional responses Yi{s) and allow them to be related 
in a nonlinear way. 

We refer to our proposed nonlinear functional regression method as “Func¬ 
tional Response Additive Model Estimation” (FRAME), which models both 
multiple functional predictors as well as functional responses. Beyond the 
extension to functional responses, ERAME makes two additional impor¬ 
tant contributions to the existing literature. Eirst, it uses a penalized least 
squares approach to efficiently fit high-dimensional functional models while 
simultaneously performing variable selection to identify the relevant predic¬ 
tors, an area that has received very little attention in the functional domain. 
ERAME is computationally tractable because we use a highly efficient co¬ 
ordinate descent algorithm to optimize our criterion. Second, ERAME ex¬ 
tends beyond the standard linear regression setting to fit general nonlinear 
additive models. Since the predictors, Xij{t), are infinite dimensional, any 
functional regression model must perform some kind of dimension reduction. 
ERAME achieves this goal by modeling the response as a nonlinear func¬ 
tion of a one-dimensional linear projection of Xij(t), a functional version of 
the single index model approach. Our method uses a supervised fit to au¬ 
tomatically project the functional predictors into the best one-dimensional 
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space. We believe this is an important distinction because projecting into 
the unsupervised PCA space is currently the dominant approach in func¬ 
tional regressions, even though it is well known that this space need not 
be optimal for predicting the response. Our nonlinear approach allows us 
to model much more subtle relationships and we show that, on our data, 
FRAME produces clear improvements in terms of prediction accuracy. 

We develop our model for novel forecasting challenges in the motion pic¬ 
ture industry. Providing accurate forecasts for the success of new products 
is crucial for the 500 billion dollar entertainment industries (such as motion 
picture, music, TV, gaming and publishing). These industries are confronted 
with enormous investments, short product life-cycles, and highly uncertain 
and rapidly decaying demand. For instance, decision makers in the movie 
industry are keenly interested in accurately forecasting a product’s demand 
pattern [Sawhney and Eliashberg (1996), Bass et al. (2001)] in order to allo¬ 
cate, for example, weekly advertising budgets according to the predicted rate 
of demand decay, that is, according to whether a film is expected to open 
big and then decay fast, or whether it opens only moderately but decays 
very slowly. 

However, forecasting demand patterns is challenging since it is highly 
heterogeneous across different products. Take, for instance, the sample of 
movie demand patterns in Figure 1. Here we have plotted the log weekly 
box office revenues for the first ten weeks from the release date for a number 
of different movies. While revenues for some movies (e.g., 13 GOING ON 30 
and 50 FIRST DATES) decay exponentially over time, revenues for others 
(e.g., BEING JULIA) increase first before decreasing later. Even for movies 
with similar demand patterns (e.g., those on the second row of Figure 1), 
the speed of decay varies greatly. 

In this article we develop FRAME to forecast the demand patterns of 
box office revenues using a number of functional predictors which capture 
various sources of information about movies, such as consumers’ word of 
mouth, via a novel data source, online virtual stock markets (VSMs). In 
a VSM, participants trade virtual stocks according to their predictions of 
the outcome of the event represented by the stock (e.g., the demand for 
an upcoming movie). As a result, VSM trading prices can provide early 
and reliable demand forecasts [Spann and Skiera (2003), Foutz and Jank 
(2010)]. VSMs are especially intriguing from a statistical point of view since 
the shape of the trading prices may reveal additional information, such as 
the speed of information diffusion which, in turn, can proxy for consumer 
sentiment and word of mouth about a new product [Foutz and Jank (2010)]. 
For instance, a last-moment price spurt may reveal a strengthening hype for 
a product and may thus be essential in forecasting its demand. 

This paper is organized as follows. In the next section we provide further 
background on virtual stock markets in general and our data in particular. 
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Fig. 1. Movie demand decay rates for a sample of movies. 


In Section 3 we present the FRAME model and its optimization criterion. 
We also discuss an efficient coordinate descent algorithm for fitting FRAME. 
In Section 4 an extensive simulation study is used to demonstrate the su¬ 
perior performance of FRAME, in comparison to a number of competitors. 
Section 5 discusses the results from applying FRAME to our movie data. 
In that section, we also address the challenge of interpreting the results 
from a model involving both functional predictors and functional responses 
using “dependence plots.” Dependence plots graphically illustrate, for typi¬ 
cal shapes of the predictors, the corresponding predicted response pattern. 
These dependence plots allow for a glimpse into the relationship between 
response and predictors and provide actionable insight for decision makers. 
We conclude with further remarks in Section 6. 

2. Data. We have two different sources of data. Our input data (i.e., 
functional predictors) come from the weekly trading histories of an online 
virtual stock market for movies before their releases; our output data (i.e., 
functional responses) pertain to the post-release weekly demand of those 
movies. We have data on a total of 262 movies. The data sources are de¬ 
scribed below. 
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2.1. Online virtual stock markets. Online virtual stock markets (VSMs) 
operate in ways very similar to real life stock markets except that they 
are not necessarily based on real currency (i.e., participants often use vir¬ 
tual currency to make trades), and that each stock corresponds to discrete 
outcomes or continuous parameters of an event (rather than a company’s 
value). For instance, a value of $54 for the movie stock 50 FIRST DATES 
is interpreted as the traders’ collective belief that the movie will accrue $54 
million in the box office during its first four weeks of theatrical exhibition. If 
the movie eventually earns $64 million, then traders holding the stock will 
liquidate (or “cash-in”) at $64 per share. 

The source of our data is the Hollywood Stock Exchange (HSX), one of 
the best known online VSMs. HSX was established in 1996 and aims at 
predicting a movie’s revenues over its first four weeks of theatrical exhibition. 
HSX has had well over 2 million active participants worldwide and each 
trader is initially endowed with $2 million virtual currency and can increase 
his or her net worth by strategically selecting and trading movie stocks 
(such as by buying low and selling high). Traders are further motivated by 
opportunities to appear on the daily Leader Board that features the most 
successful traders. 

For each movie we collect four functional predictors between 52 and 10 
weeks prior to the movie’s release date. They are the following: the intra¬ 
day average price (i.e., the average of the highest and lowest trading prices 
of the day, as recorded by HSX) on each Friday (which is the most active 
trading day of the week), each Friday’s number of accounts shorting the 
stock, number of shares sold, and number of shares held short. Figure 2 
shows the curves for one of these predictors, average price, for the movie 
demand patterns from Figure 1. Note that since our goal is to accomplish 
early forecasts, we only consider information between 52 and 10 weeks prior 
to a movie’s release (i.e., up to week —10 in Figure 2). We form predictions 
of movie decay ten weeks prior to release because this provides a realistic 
time frame for managers to make informed decisions about marketing mix 
allocations and other strategic decisions. Of course our analysis could also 
be performed using data closer to the release date. 

Our FRAME method captures differences in shapes of VSM trading his¬ 
tories (such as price or volume), for example, trending up or down, concavity 
vs. convexity or last-moment spurts. The empirical results in Section 5 show 
that these shapes are predictive of the demand pattern over a product’s life 
cycle. For example, a rapid increase in early VSM trading prices may sug¬ 
gest a rapid diffusion of awareness among potential adopters and a strong 
interest in a product. Thus, it can suggest a strong initial demand immedi¬ 
ately after a new product’s introduction to the market place, for example, 
a strong opening weekend box office for a movie. Similarly, a new product 
whose trading prices increase very sharply over the pre-release period may 
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Fig. 2. HSX trading histories for the sample of movies from Figure 1. 


be experiencing strong positive word of mouth, which may lead to both a 
strong opening weekend and a reduced decay rate in demand for the movie, 
that is, increased longevity. 

2.2. Weekly movie demand patterns. Our goal is to predict a movie’s de¬ 
mand (i.e., its box office revenue). Specifically, we want to predict a movie’s 
demand not only for a given week (e.g., at week 1 or week 5), but over its 
entire theatrical life cycle of about 10 weeks (i.e., from its opening week 1 
to week 10). Figure 3 shows weekly demand for all 262 movies in our data 
(on the log-scale). The left panel plots the distribution across all movies and 
weeks; we can see that (log) demand is rather symmetric and appears to 
be bi-modal. We can also see that a portion of the data equals zero; these 
correspond to movies with zero demand, particularly in later weeks (the 
constant 1 was added to all revenues before taking the log transformation). 
During weeks 1 and 2, every movie has positive revenue. In week 3, only 4 
movies have zero revenue; this number increases to 67 movies by week 10. 
The right panel shows, for each individual movie, the rate at which demand 
decays over the 10-week period. We can see that whereas some movies decay 
gradually, a number have sudden drops, while others initially increase after 
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Fig. 3. Distribution of movies’ weekly demand and demand decay patterns. The right 
panel shows 10-week decay patterns (from the release week until 9 weeks after release) for 
the 262 movies in our sample; the left panel shows the distribution of the corresponding 
10 X 262 = 2620 weekly log-revenues. 


the release week. Our goal is to characterize different demand decay shapes 
and to nse the information from the VSM to forecast these shapes. 

3. Functional Response Additive Model Estimation. In this section we 
develop our Functional Response Additive Model Estimation (FRAME) ap¬ 
proach for relating a functional response, F)(s), to a set of p functional pre¬ 
dictors, Xii{t),... ,Xip(t), and q univariate predictors, Zn,..., Zig, where 
i = 1,..., n. 

3.1. FRAME model. The classical functional linear regression model is 
given by 

(2) Yi{s) = J P{s,t)Xi(t)dt-\-Si{s), 

where /3{s,t) is a smooth two-dimensional coefficient function to be esti¬ 
mated as part of the fitting process. Note we assume throughout that the 
predictors and responses have been centered so that the intercept term can 
be ignored. We also assume that the response curves Ti(s) are independent, 
given Xi{t); for work on correlated response curves, see, for example, Di 
et al. (2009) or Crainiceanu, Caffo and Morris (2011). 

The model given by (2) has been applied in many settings. However, it 
has two obvious deficiencies for use with our data. First, it assumes a single 
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functional predictor, whereas our data contains p = 4 functional predictors 
and a nnmber of nnivariate predictors. Second, the integral in (2) is a natural 
analogue of the summation term in the linear regression model. Hence, (2) 
assnmes a linear relationship between the predictor and the response. In 
many sitnations this assnmption is too restrictive, so we wish to allow for a 
nonlinear relationship. 

In this paper we model the relationship between the response function 
and the predictors nsing the following nonlinear additive model: 

p Q 

(3) +'^^4>kis,Zij) +ej(s), 

j=l k=l 

where fj{s,x) and 4)k{s,z) are general nonlinear functions to be estimated. 
Model (3) has the advantage that it is able to incorporate all p + q predictors 
using a natnral additive model. It is also flexible enough to model nonlinear 
relationships. However, fitting (3) poses some significant difficulties. First, 
if p or g are large relative to n, we end np in a high-dimensional situation 
where many different nonlinear fnnctions must be estimated. We address this 
issue by fitting (3) using a penalized least squares criterion. Onr penalized 
approach has the effect of automatically performing variable selection on the 
predictors, in a similar fashion to the lasso [Tibshirani (1996)] or gronp lasso 
[Ynan and Lin (2006)] methods. Hence, we can very effectively deal with a 
large nnmber of predictors. Second, even for a low value of p, estimating a 
completely general fj{s,x) is infeasible because Xij(t) is itself an inhnite- 
dimensional function. Instead we model fj{s,x) using a functional single 
index model: 

fj{s,Xij)=gj(^j l3j{s,t)Xij{t)dt^, 


where I3j{s,t) is a two-dimensional index function which projects Xij(t) 
into a single direction and gj{x) is a one-dimensional function representing 
the nonlinear impact of the projection on yi(s). In this way the task of 
estimating fj{s,x) is reduced to the simpler problem of estimating 
and gj{x). Note that our primary interest in this paper is in forming accurate 
predictions for the response, ^((s). Hence, we are generally not concerned 
with identihability of gj{x) and f3j{s,t), which would be more important in 
an inference setting. Nevertheless, empirically we have found that gj{x) and 
/3j{s,t) can often be well estimated. 

Using this functional index model (3) reduces to 


(4) 


Yiis) = '^gj 


l3j{s,t)Xij(t) dt +E 4>k{s,Zij) -hei(s). 


k=l 
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We then model I3j{s,t) = h{s,t)'^r]j and Xij{t) = h{t)'^Oij, where b(s,t) and 
b(t) are appropriately chosen basis functions. In implementation, to ensure 
that I3j{s,t) and gj{x) are identifiable, we constrain \\r]j\\ = 1 for all j. Using 
this representation, 


(5) J Pj{s,t)Xij{t)dt = eJj j h{t)h{s,t) 


dt 


Vj = ^ijisVv 


3'> 


where Oij{s) = [/b(s,t)b(t)^Note that rjj must be estimated as part 
of the fitting process, but 6ij{s) can be assumed known for all s because 
b(s,t) and b(t) are given, so the integral can be directly computed. In 
addition, dij can be easily computed since Xij{t) is directly observed. 

Using this basis representation, (4) becomes 


( 6 ) 


P Q 

yiis)=J29j{^ijisfVj) + ^Ms,Zik)+£iis). 

j=l k=l 


In practice, the response function, I^(s), will generally be observed at a 
finite set of time points, Sji,..., . For example, for the box office data 

the revenues are observed at each of the first ten weeks. In this situation (6) 
can be represented as 

p q 

yii = ^9j{oiiVj)+Yl 4^k 1 Zik) ~\~ £il ) 

j=l k=l 

( 7 ) 

i — 1,...,?7.,^ — 1,..., 77-2, 

where Yu = Yi{sii), Oiji = Oij{si) and su are assumed to be independent for 
all i and I [conditional on Xij{t) and Zik]. 


3.2. FRAME optimization criterion. Fitting FRAME requires estimat¬ 
ing the unobserved parameters, gj{x),rij and (pk{s,z), which we achieve 
using a supervised least squares penalization approach. In particular, the 
FRAME fit is produced by minimizing the following criterion over a grid of 
possible values for the tuning parameter A > 0: 


( 8 ) 



Yiis) - '^gjiOijisfrij) - '^(j)k{s,Zik) 

j=l k=l 


2 

ds 



where \\fjf = Y17=i I ds with fj{s,Xij) = gj{9ij{sf rij), ||(/>fc||2 = 

J2?=i f 4>kis, Zik)'^ ds and p{-) is a penalty function. 
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The first term in (8) corresponds to the squared error between Yi{s) and 
the FRAME prediction, integrated over s, and ensures an accurate fit to 
the data. The second term places a penalty on the £2 norms of the fj{x)’s 
and 0fc(s,z)’s. Note that penalizing the squared £2 norms, ||/j|P and ||<^fc|P, 
would be analogous to performing ridge regression. However, we are penal¬ 
izing the square root of this quantity, which has the effect of shrinking some 
of the functions exactly to zero and hence performing variable selection in a 
similar fashion to the group lasso [Yuan and Lin (2006), Simon et al. (2013)]. 

For a response sampled at a finite set of evenly spaced time points, 
si, S 2 ,..., Si, we approximate (8) by 



(9) 




that the response has been sampled at a dense enough set of points that the 
integral is well approximated by the summation term. This approximation 
worked well for our data, but for sparsely sampled responses one would 
need to first fit a smooth approximation of the response and sample the 
fitted curve over a dense set of time points. 

We further assume that gj{x) and 4>k{s,z) can, respectively, be well ap¬ 
proximated by basis functions h(3:) and u:{s,z) such that gj{x) « h(3:)^^j 
and 4>k{s, z) pz u{s, z)'^a.k. At each response time point si, let hiji = h{6fjir]j) 
and Uiki = u{si,Zik) with 9iji dehned in (9). Then using this basis represen¬ 
tation, (9) can be expressed as 



( 10 ) 



where Hj is a matrix with rows hiji, hij 2 , • • ■, hijL; h 2 ji,..., hnji and Hfc is 
dehned similarly using um ■ The FRAME ht is then produced by minimizing 
(10) over and 
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Algorithm 1 Step 1 of FRAME algorithm 

0. Initialize and for j = 1,... ,p 

and k = 1,... ,q, where the matrices Hj and flfc are defined in (10). 

For each j € {1,... ,p} and k £ {1,... ,q}: 

1. Fix all ^ji for j' ^ j. Compute the residual vector Rj = Y — 
Yhj'^j ~ Ylk=l ^kf^k- 

2. Let = CjSj^Hj where Cj = (1 — A/||Rj5j^Rj||)+ is a shrinkage pa¬ 
rameter. 

3. Center fj ^ fj — mean(fj). 

4. Fix all Qfc/ for k' ^ k. Compute the residual vector R^ = Y — 

- T,k'jtk^k'ak'. 

5. Let Qfc = Cfc5^Rfc where = (1 — A/||nfcS'|^Rfc||)+ is a shrinkage pa¬ 
rameter. 

6. Center — mean(^;j). 

Repeat 1 through 6 and iterate until convergence. 


3.3. FRAME optimization algorithm. For a given value of A, we break 
the problem of minimizing (10) into two iterative steps, where we first es¬ 
timate and OLk given r]j, and second estimate rjj given and ctfc. One 
advantage of this approach is that the minimization of (10) in the first step 
can be achieved using an efficient coordinate descent algorithm which we 
summarize in Algorithm 1. 

Our approach has the same general form as similar algorithms used in 
other settings. In particular, arguments similar to those in Ravikumar et al. 
(2009) and Fan, James and Radchenko (2014) prove that Algorithm 1 will 
minimize a penalized criterion of the form given by (10) provided p{t) = t. 
We discuss the extension to a general penalty function in the Appendix. 
Note that the and matrices defined in Algorithm 1 only need to be 
computed once so the calculations in 1 through 6 of Algorithm 1 can all be 
performed efficiently. 

In the second step we estimate rjj, given current estimates for the 
and Qfc’s, by minimizing the sum of squares term 

n L ( p q 'I 2 

( 11 ) [ 
i=l l=l I j=l k=l J 

over r]j. Note that we do not include the penalty when estimating rjj because 
the r/j’s are providing a direction in which to project Xij{t) and are thus 
constrained to be norm one. Hence, applying a shrinkage term would be 
inappropriate. Minimization of (11) can be approximately achieved using a 
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Algorithm 2 FRAME algorithm 

0. Choose initial values for f)j for j G {1,... ,p}. 

1. Compute hiji using the current estimates for rjj. Estimate and a.^ 
using Algorithm 1. 

2. Conditional on the ^^’s and ctfc’s from step 1, estimate the r/^’s by 
minimizing (11). 

3. Repeat steps 1 and 2 and iterate until convergence. 


first order Taylor series approximation of gj{x). We provide the details on 
this minimization in the Appendix. 

Eormally, the FRAME algorithm is summarized in Algorithm 2. 

3.4. Tuning parameters. Fitting FRAME requires selecting the regular¬ 
ization parameter A and the basis functions h{t), b(s,t), h(x) and u{s,t) 
defined in (5) and (10). For our simulations and the HSX data we used cubic 
splines to model h(x),b(t) and h{s,t), and a simple linear representation 
for ij{s,z) so (j)k{s,Zk) = ZkOtk- We selected the dimensions of these bases 
simultaneously using 10-fold cross-validation (CV) based on prediction er¬ 
ror. More specifically, we chose a grid of values for the dimension of each 
basis and randomly partitioned the original sample into 10 subsamples of 
equal size. For each A: = 1,..., 10, we used 9 subsamples to fit the model with 
dimensions of these bases fixed at a given combination of the grid values, 
and used the remaining subsample to calculate the prediction error. The 
cross-validated prediction error is then calculated as the average prediction 
error over the 10 validation subsamples. Thus, for every combination of ba¬ 
sis dimensions, we obtained one cross-validated prediction error. The final 
selected dimensions for these basis functions are the ones which minimize 
the 10-fold cross-validated prediction error. Since the FRAME algorithm is 
very efficient, this approach worked well on our data. 

To compute A, one could potentially add a grid of values for A to the 
above 10-fold CV, fit ERAME over all possible combinations of the tuning 
parameter values, and select the “best” value. However, a more efficient 
approach is to compute initial estimates for rjj, minimize (10) over and 
Qfc for each possible value of A, choose the ^^’s and oik’s corresponding to 
the value of A with the lowest 10-fold CV, estimate the r/^’s for only this one 
set of parameters, and iterate. This approach means that, for each iteration, 
the minimization of (11) only needs to be performed for a single value of A. 
We found this approach worked well for choosing the tuning parameters in 
both our simulated and real data analyses. 

4. Simulations. In this section we conduct a simulation study to compare 
the performance of FRAME to several alternative functional approaches. We 
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first generated p = 6 functional predictors using Xij{t) = Y{t)0ij + 
where F(t) was a 3-dimensional Fourier basis, 9ij was simulated from a 
A^( 0 ,l 3 ) distribution, and the Sij{t)'s were independent over i,j and t with 
a A^(0,0.1^) distribution. Each predictor was sampled at 150 equally spaced 
time points over the interval t G [0,1]. In addition, q = 8 scalar predictors, 
Zik, were simulated from a standard normal distribution. Next, we generated 
Pj{s,t) = I3ji{s) + Pj 2 it) + 0.iPji{s)l3j2{t), where I3ji{s) = h{s)^rij^, I3j2{t) = 
h{t)'^r]j2, b(-) was a 5-dimensional cubic spline basis, and rjji and 77^2 were 
independent N( 0 ,l 5 ) vectors. 

The responses were generated from the model 


( 12 ) 


= / f3j{si,t)Xij{t)dt] +'^jkZik + ei{si), 

j=i ^ k=\ 


i = l,...,n, 


where Sii^Sf) ~ N(0,0.1^) and Yi{sg) was sampled at 20 equally spaced time 
points si,..., sl over the interval s G [0,1]. We set gi{x) = sin(x), g 2 {x) = 
cos(x) and gj{x) = 0 for j = 3,..., 6. Thus, only the hrst two functional pre¬ 
dictors were signal variables, with the remainder representing noise. Sim¬ 
ilarly, we set 71 = 1 and 7^ = 0 for k = 2,... ,8 so the last seven scalar 
predictors were noise variables. All training data sets were generated using 
n = 200 observations. 

We compared FRAME to six possible competitors. The simplest, Mean, 
ignored the predictors and used the average of the training response, at each 
time point s, to predict the responses on the test data. This method serves 
as a benchmark to illustrate the improvement in prediction accuracy that 
can be achieved using the predictors. The next method was the Classical 
Functional Linear Regression model given by (2). We fit (2) by computing 
the hrst G functional principal components (FPC) for the response function, 
and also the hrst K FPCs for each predictor function. We then used the 8 
scalar predictors and the GK FPC scores from the 6 functional predictors 
to ht separate linear regressions to each of the hrst G FPC scores on the 
response. To form a hnal prediction for the response function, we multiplied 
the estimated FPC scores by the hrst G principal component functions. The 
value of G, between 1 and 4, and K, between 1 and 3, were both chosen 
using 10-fold cross-validation. The classical functional approach does not 
automatically perform variable selection, so we also ht a variant (PCA-L). 
The only difference between Classical and PCA-L is that the latter method 
used the group Lasso to compute the linear regressions between the response 
and predictor principal component scores and hence selected a subset of the 
predictors. 

The fourth method, PCA-NL, was identical to PCA-L except that a non¬ 
linear generalized additive model (GAM) was used to regress the response 
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principal component scores on the predictor scores. Standard GAM does not 
automatically perform variable selection, so we fit PCA-NL using a variant 
of SPAM [Ravikumar et al. (2009)], which implements a penalized nonlinear 
additive model procedure and hence selects a subset of the predictors. We 
used the Lasso penalty function with the tuning parameter, A, chosen over 
a grid of 20 values via 10-fold CV. Similarly, the dimension of the nonlinear 
functions used in SPAM were chosen, between 4 and 6, using 10-fold CV. 

The next method. Last Observation, took as inputs Zn,..., Zig plus the 
last observed values of Xij(t), that is, Vji(ti 5 o),..., Vj 6 (ti 5 o)- We then used 
the resulting 14 scalar predictors to estimate separate GAM regressions for 
the response at each observed point, V(si),..., V(s 2 o)j a total of 20 different 
regressions. As with PCA-NL, we used a variant of SPAM to perform vari¬ 
able selection. While using only the last observed time point may appear 
to be a naive approach, these methods are common in situations like the 
HSX data, where it is often assumed that all the information is captured at 
the latest time point. Hence, we implemented this approach to illustrate the 
potential advantage from incorporating the entire functional predictor. 

The final comparison method, FPCA-FAR, combined the PPG A approach 
with the FAR method proposed in Fan, James and Radchenko (2014). FAR 
does not directly correspond to our setting because it is designed for prob¬ 
lems involving functional predictors but only a scalar response. FPCA-FAR 
addresses this limitation by producing G separate FAR fits, one for each of 
the first G FPC scores. The FAR method has similar tuning parameters to 
SPAM, which were again chosen using 10-fold CV. 

In htting FRAME we set I3j{s,t) = (3ji{s) + I3j2{t), where (3ji{s), j3j2{t) 
and gj{x) were approximated using cubic splines. The dimension of the ba¬ 
sis for both f3j2{t) and /3(t) was selected as the value among 4,5,6, which 
gave the lowest prediction error to Xij{t) on the held-out time points. In 
particular, for each possible dimension we held out every 5th observed time 
point for each Xij{t), produced a least squares fit using the remaining ob¬ 
servations, and then calculated the squared error between the observed and 
predicted values of Xij{t) at the held-out time points. The value of A and the 
dimensions of /3ji(s) and gj{x) were all chosen using 10-fold CV in a similar 
fashion to the other comparison methods. We set p equal to the identity 
function, which corresponds to a group lasso type penalty function. 

In order to match a real-life setting, we deliberately generated the data 
from a model that does not match the FRAME fit. In particular, the true 
j5j{s,t) function included an interaction term, while the FRAME estimate 
was restricted to be additive, the predictors were generated from a Fourier 
basis but approximated using a spline basis, and the nonlinear functions, 
gi{x) and g 2 {x), were generated according to sin and cos functions, respec¬ 
tively, but approximated using a spline basis. In addition, all the various 
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Table 1 

False positive (FP) rates, false negative (FN) rates and their prediction errors (PE) for 
the seven comparison methods, averaged over the 100 simulation runs. The top rows 
relate to the functional predictors, Xjft), and the lower rows to the scalar predictors, Zk- 
Standard errors are provided in parentheses 




Mean 

Classical 

PCA-L 

PCA-NL 

Last Obs. 

FPCA-FAR 

FRAME 

Functional 

FP 

_ 

- 

0.0600 

0.4200 

0.2395 

0.0375 

0.0000 



- 

- 

( 0 . 0182 ) 

( 0 . 0333 ) 

( 0 . 0101 ) 

( 0 . 0114 ) 

( 0 . 0000 ) 


FN 

- 

- 

0.4400 

0.0200 

0.3002 

0.1300 

0.0600 



- 

- 

( 0 . 0163 ) 

( 0 . 0098 ) 

( 0 . 0102 ) 

( 0 . 0220 ) 

( 0 . 0163 ) 

Scalar 

FP 

- 

- 

0.0971 

0.3671 

0.2419 

0.0400 

0.0000 



- 

- 

( 0 . 0175 ) 

( 0 . 0247 ) 

( 0 . 0089 ) 

( 0 . 0117 ) 

( 0 . 0000 ) 


FN 

- 

- 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 



- 

- 

( 0 . 0000 ) 

( 0 . 0000 ) 

( 0 . 0000 ) 

( 0 . 0000 ) 

( 0 . 0000 ) 


PE 

1.1983 

0.1108 

0.1040 

0.1284 

0.2727 

0.0680 

0.0651 



( 0 . 0035 ) 

( 0 . 0030 ) 

( 0 . 0029 ) 

( 0 . 0024 ) 

( 0 . 0035 ) 

( 0 . 0019 ) 

( 0 . 0020 ) 


FRAME tuning parameters were automatically selected using CV, as part 
of the fitting process, so the true dimension of the basis functions was not 
assumed to be known. 

We generated 100 different training data sets and fit each of the seven 
methods to all 100 data sets. False negative rates (FN), the fraction of sig¬ 
nal variables incorrectly excluded, and false positive rates (FP), the fraction 
of noise variables incorrectly included, were computed. The prediction er¬ 
ror, PE = 2 (W was also calculated on a large 

test data set with N = 1000 observations. The results, averaged over the 
100 simulations, are displayed in Table 1, with standard errors shown in 
parentheses. Since the Last Observation method contains separate fits for 
each time point, its FN and FP rates are averaged over the twenty different 
fits. Figure 4 plots the prediction errors over s. 

All methods show significant improvement over the Mean approach, indi¬ 
cating that the scalar and functional variables have real predictive ability. 
FRAME had perfect variable selection results on the scalar predictors, with 
false positive and false negative rates both being zero. All methods had zero 
false negative rates on the scalar predictors. However, PCA-NL and Last 
Observation both had high false positive rates. FRAME also did a much 
better job than all its competitors in identifying the functional predictors. 
PCA-NL and Last Observation had high false positive rates for the func¬ 
tional predictors, and the PCA-L and Last Observation methods had high 
false negative rates. In terms of prediction error, FRAME is considerably 
superior to all methods except for FPCA-FAR. In comparing FRAME to 
FPCA-FAR, we note that while FRAME only results in a small improve- 
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Fig . 4. Mean prediction errors for five of the comparison methods at each of the 20 time 
points that the response function was observed over. The Classical and PCA-L curves were 
not plotted to make the figure easier to read. 

ment in terms of prediction error, it does a far better job in selecting the 
correct variables. 

5. Forecasting demand decay rates. In this section we provide results 
from applying our FRAME approach to the HSX data. In doing so, we 
assume that the revenue curves of any two movies are independent, given 
the predictors. This assumption is not unreasonable because managers use 
strategic scheduling [Einav (2010)] to minimize the risk of two movies si¬ 
multaneously competing for the same audience. More importantly, the HSX 
data (i.e., our predictors) have incorporated relevant information about the 
movies [Eoutz and Jank (2010)]. Hence, one might expect much lower cor¬ 
relations among movies after conditioning on the predictors. 

Eigure 5 illustrates the modeling setup. Recall that for each movie we 
collect four functional predictors: the intra-day average price, the number 
of accounts shorting the stock, the number of shares sold and the number 
of shares held short. These curves capture related yet distinct aspects of 
consumer sentiment and word of mouth about a movie. The four functional 
predictors (represented using the green curve before the movie release in Fig¬ 
ure 5) are observed from 52 up to 10 weeks prior to the movie’s release. We 
then use FRAME to form predictions of li(s) = log (cumulative revenue for 
movie i at week s) (blue line after the movie release). 

In Section 5.1 we test the predictive accuracy of ERAME on the HSX 
data in relation to that of several competing methods. Then in Section 5.2 
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Fig. 5. Illustration of our model. 


we discuss a graphical approach to obtain new insight into the relationship 
between VSMs and movies’ success. 

5.1. Prediction accuracy. We compare a number of functional and non¬ 
functional methods to predict the box office cumulative revenue pattern for 
our 262 movies. Table 2 provides weekly mean absolute errors (MAE) be¬ 
tween the predicted and actual cumulative box office revenue (on the log 
scale) for FRAME as well as six comparison methods. Specifically, we ran¬ 
domly divide the movies into training and test data (180 and 82 movies, 
resp.), ht the various methods using the training data and then compute 
MAE for week s on the test data: 

(13) MAE(.) = -^j;|y,(.)-y,(.)|, 

' ' i^T 

where T represents the test data and Yi{s) the prediction for week s using 
a given method. We repeat this process over 20 random partitions of the 


Table 2 

Mean absolute errors (MAEs) on test data for FRAME and six competing methods 
averaged over twenty random partitions of the movies 



Mean 

Classical 

PCA-L 

PCA-NL 

Last Obs. 

FPCA-FAR 

FRAME 

Week 1 

2.1898 

1.5365 

1.5856 

1.1793 

1.1534 

1.2011 

1.0952 

Week 2 

2.0490 

1.4214 

1.4582 

1.0951 

1.0683 

1.1165 

1.0116 

Week 3 

1.9057 

1.3107 

1.3372 

1.0157 

1.0335 

1.0323 

0.9482 

Week 4 

1.8335 

1.2694 

1.2900 

0.9915 

0.9970 

1.0106 

0.9364 

Week 5 

1.7907 

1.2490 

1.2666 

0.9815 

0.9923 

1.0002 

0.9305 

Week 6 

1.7610 

1.2385 

1.2527 

0.9785 

0.9944 

0.9960 

0.9324 

Week 7 

1.7418 

1.2329 

1.2431 

0.9759 

0.9868 

0.9952 

0.9371 

Week 8 

1.7294 

1.2301 

1.2379 

0.9749 

1.0132 

0.9947 

0.9397 

Week 9 

1.7199 

1.2269 

1.2337 

0.9759 

0.9938 

0.9952 

0.9432 

Week 10 

1.7144 

1.2261 

1.2322 

0.9772 

1.0051 

0.9962 

0.9460 
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Table 3 

Average number of times each of the four predictors were selected for each method 



Price 

Account short 

Shares sold 

Shares short 

FRAME 

1.00 

1.00 

0.00 

0.00 

FPCA-FAR 

1.00 

0.30 

0.00 

0.00 

PCA-L 

1.00 

0.80 

0.00 

0.00 

PCA-NL 

1.00 

0.05 

0.30 

0.65 

Last. Obs. 

1.00 

0.58 

0.62 

1.00 


movies and average the resulting MAE’s. All seven methods are implemented 
in the same fashion as was used in the simulation analysis. 

A few trends are clear from Table 2. First, all methods dominate Mean, in¬ 
dicating that the HSX curves contain useful predictive information. Second, 
the errors tend to decline over time, suggesting that there is more variabil¬ 
ity in the early weeks, but, to some extent, this averages out over time. 
Third, PCA-NL, FPCA-FAR and Last Observation give similar results and 
dominate Classical and PCA-L. Thus, there is clear evidence of a nonlinear 
relationship. Finally, FRAME provides superior results in comparison to the 
other six approaches for each of the ten weeks. The relative advantage of 
FRAME is highest in the first couple of weeks where predictions appear to 
be the most difficult. 

Table 3 records the number of times each of the four predictors were 
selected, averaged over the 20 different training data sets. The intra-day 
average price variable appears to be the most important, with all methods 
selecting it on every run. FRAME also selected the variable of accounts 
trading short but ignored the remaining two predictors. By comparison. 
Last Observation chose the largest models, often including all four predic¬ 
tors. This may have been to compensate for the fact that the method only 
observed the final time point for each curve. 

To further benchmark FRAME against alternative methods that are com¬ 
monly used in the literature on movie demand forecasting [Sawhney and 
Eliashberg (1996)], Table 4 provides error rates for seven additional models. 
For each of these models, we estimate ten separate weekly linear regressions, 
one for each of the ten revenue weeks. We fit each regression to the training 
data, using the same 20 random partitions as in Table 2, and report the av¬ 
erage MAE’s on the test data. The first six models are based on individual 
movie features, respectively, genre (e.g., drama or comedy), sequel (yes/no), 
production budget (in dollars), MPAA rating, run time (in minutes) and 
studios (e.g.. Universal or 20th Century Fox). The seventh model is based 
on a combination of all six features. The best individual predictor appears 
to be genre, but combining all six predictors gives the best results. How¬ 
ever, the MAE’s from the combined model are still significantly higher than 
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Table 4 

Mean absolute errors on test data using various characteristics of the movies. Errors are 
averaged over twenty random partitions 



Genre 

Sequel 

Budget 

Rating 

Run time 

Studio 

All 

Week 1 

1.632 

2.136 

1.899 

1.850 

2.209 

2.040 

1.445 

Week 2 

1.589 

2.003 

1.762 

1.749 

2.064 

1.915 

1.395 

Week 3 

1.510 

1.858 

1.620 

1.634 

1.905 

1.770 

1.312 

Week 4 

1.487 

1.792 

1.564 

1.604 

1.829 

1.714 

1.304 

Week 5 

1.472 

1.753 

1.535 

1.587 

1.784 

1.685 

1.296 

Week 6 

1.463 

1.728 

1.516 

1.578 

1.755 

1.668 

1.291 

Week 7 

1.458 

1.713 

1.501 

1.569 

1.735 

1.656 

1.287 

Week 8 

1.457 

1.703 

1.492 

1.563 

1.723 

1.648 

1.286 

Week 9 

1.457 

1.695 

1.487 

1.561 

1.714 

1.642 

1.287 

Week 10 

1.458 

1.691 

1.484 

1.559 

1.709 

1.639 

1.287 


for the best methods in Table 2, suggesting that the HSX curves provide 
additional information beyond that of the movie features. 

5.1.1. Why does FRAME predict so well? We now offer a closer look 
into when (and potentially why) the prediction accuracy of FRAME is su¬ 
perior to that of the alternative methods in Tables 2 and 4. To that end, we 
investigate the relationship between FRAME’S mean absolute percentage 
error (MAPE) in cumulative revenues over the first ten weeks since release 
and film characteristics, such as budget, genre, MPAA rating, and the vol¬ 
ume and valence of critics’ reviews. Similarly, we examine how the relative 
performance of FRAME (i.e., the difference between FRAME’S MAPE and 
the lowest MAPE of either PCA-NL or FPCA-FAR) is associated with film 
characteristics. Tables 5 and 6 show the linear regression results. 

Table 5 shows that FRAME performs well (i.e., has a low prediction er¬ 
ror) for movies that are sequels, rated below R, have a shorter runtime, are 
released by a major studio such as Paramount, Warner Brothers, Univer¬ 
sal or Twentieth Century Fox, and reviewed by a larger number of critics. 
Intuitively, these results suggest that FRAME performs especially well for 
movies that enjoy a greater capability for creating pre-release buzz. For in¬ 
stance, sequels build upon the success of their predecessors; films released 
by major studios benefit from significant advertising and publicity before 
opening; those with lower MPAA ratings, for example, G and PG, appeal to 
wider audiences; and greater attention from the critics, due to, for instance, 
a film’s quality or controversies, could further fuel the public’s fascination. 
Such firm- or consumer-generated buzz provides rich information to the 
HSX traders, who rapidly integrate the information into the stock trading. 
FRAME seems to be capable of capturing the dynamics of such buzz and 
translating it into accurate predictions. 
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Table 5 

Linear regression of FRAME’S prediction error on film 
characteristics 


Name 

Coefficient 

Std err. 

t 

p-value 

Intercept 

0.098 

0.068 

1.439 

0.151 

Sequel 

-0.033 

0.014 

-2.314 

0.022 

Budget 

0.000 

0.000 

0.587 

0.558 

Action 

-0.015 

0.050 

-0.296 

0.768 

Animated 

0.016 

0.054 

0.306 

0.760 

Comedy 

-0.009 

0.050 

-0.185 

0.853 

Drama 

-0.011 

0.050 

-0.216 

0.829 

Horror 

0.004 

0.050 

0.086 

0.931 

Other genres 

0.066 

0.060 

1.098 

0.273 

Rating below R 

-0.026 

0.011 

-2.417 

0.016 

Runtime 

0.001 

0.000 

2.516 

0.013 

Major studio 

-0.039 

0.010 

-3.744 

0.000 

Oscar 

0.030 

0.028 

1.062 

0.289 

Critics volume 

-0.001 

0.000 

-7.600 

0.000 

Critics valence 

0.006 

0.005 

1.322 

0.188 

Consumer WOM volume 

0.000 

0.000 

1.943 

0.053 

Consumer WOM valence 

0.004 

0.006 

0.654 

0.514 


Table 6 

Linear regression of the difference between FRAME’S prediction 
error and the lowest error of either PCA-NL or FPCA-FAR on 
film characteristics 


Name 

Coefficient 

Std err. 

t 

p-Vcdue 

Intercept 

0.011 

0.024 

0.465 

0.642 

Sequel 

0.000 

0.005 

0.036 

0.971 

Budget 

0.000 

0.000 

-0.307 

0.759 

Action 

-0.013 

0.018 

-0.746 

0.456 

Animated 

-0.019 

0.019 

-1.023 

0.308 

Comedy 

-0.018 

0.017 

-1.010 

0.314 

Drama 

-0.023 

0.017 

-1.326 

0.186 

Horror 

-0.012 

0.018 

-0.685 

0.494 

Other genres 

0.015 

0.021 

0.721 

0.471 

Rating below R 

-0.001 

0.004 

-0.302 

0.763 

Runtime 

0.000 

0.000 

-1.101 

0.272 

Major studio 

-0.006 

0.004 

-1.731 

0.085 

Oscar 

0.017 

0.010 

1.764 

0.079 

Critics volume 

0.000 

0.000 

3.198 

0.002 

Critics valence 

-0.001 

0.002 

-0.533 

0.595 

Consumer WOM volume 

-0.000 

0.000 

-3.901 

0.000 

Consumer WOM valence 

0.004 

0.002 

1.936 

0.054 
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THE MANCHURIAN CANDIDATE TERMINAL, THE 




ICE PRINCESS 



Week 


BEAUTY SHOP PETER PAN 




THE RING TWO 



Week 


Fig. 6. Top 6 movies with the smallest FRAME prediction error: the solid lines coTwe- 
spond to frame’s prediction; the dashed lines show the corresponding true values. The 
two closest competitors are given by the dotted lines (PCA-NL) and the dash-dotted lines 
(FPCA-FAR), respectively. 


Figure 6 shows the six movies for which FRAME predicts the best in terms 
of MAPE. Two-thirds of these six movies were released by major studios with 
the exception of THE RING TWO and THE TERMINAL. Moreover, all of 
them were rated below R except for THE MANCHURIAN CANDIDATE. 
And all attracted more than a hundred critics’ reviews. A third of them are 
sequels, specifically PETER PAN and THE RING TWO, as compared to 
11% in the sample. Moreover, sequels are not far down the list. Eor example, 
ERAME also provides excellent predictions for sequels like MISS CONGE¬ 
NIALITY 2 and OCEAN’S TWELVE. By contrast, ERAME predicts the 
least accurately for the following movies: KAENA: THE PROPHECY, THE 
INTENDED and EULOGY. None of these movies was a sequel or produced 
by a major studio. Only KAENA: THE PROPHECY had a below-R rating; 
and the volumes of critics’ reviews for all three movies were below 35. 

It is possible that movies with some of the above identified character¬ 
istics—sequels, low MPAA ratings, major studio releases and more crit¬ 
ics’ reviews—are easier to predict in general by any method, not only by 
ERAME. Indeed, Table 6 shows that ERAME does not have a statistically 
significant advantage (despite directionally so) over PCA-NL or EPCA-EAR 
in predicting demand for films of the above characteristics. Nonetheless, 
ERAME continues to outperform the alternative methods for films generat- 
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ing more viewer ratings online, suggesting its distinct ability to incorporate 
information potentially not captured by alternative methods, such as poten¬ 
tial viewers’ interest that is not widely available ten weeks prior to a film’s 
release. 

5.2. Model insight. The previous section has shown that using a fully 
functional regression method such as FRAME can be beneficial for forecast¬ 
ing demand decay patterns. However, while nonlinear functional regression 
methods can result in good predictions, one downside is that because both 
model-input (HSX trading paths) as well as model-output (cumulative box 
office demand) arrive in the form of functions, it is hard to understand the 
relationship between the response and the predictors. 

A useful graphical method to address this shortcoming is to visualize 
the relationship by generating candidate predictor curves, using the fitted 
FRAME model to predict corresponding responses and then plotting X{t) 
and E(s) together. The idea is similar to the “partial dependence plots” 
described in Hastie, Tibshirani and Friedman (2001); however, in contrast 
to their approach, our plots take into account the joint effect of all predictors 
(and are hence not “partial”); we thus call our graphs “dependence plots.” 

Figure 7 displays several possible dependence plots with idealized input 
curves in the left panel and corresponding output curves from FRAME in the 
right panel. Note that since in our empirical analysis the intra-day average 
price was by far the most important predictor, we use that variable as X{t) 
and fit FRAME with this single functional predictor. We study a total of 
four different scenarios. The top row corresponds to a situation where all 
input curves start and end at the same values (0 and 100, resp.); their only 
difference is how they get from the start to the end: the middle curve (solid 
line) grows at a linear rate; the upper and lower curves (dotted and dashed 
lines) grow at logarithmic and exponential rates, respectively. In that sense, 
the three curves represent movies whose HSX prices either grow at a constant 
(linear) rate, or grow fast early but then slow down (logarithmic) or grow 
slowly early only to increase toward release (exponential). 

The top right panel shows the result: the logarithmic HSX price curve 
(dotted line) results in the largest cumulative revenue. In particular, its cu¬ 
mulative revenue is larger compared to the linear price curve (solid line), and 
both logarithmic and linear price curves beat the cumulative revenue gen¬ 
erated by the exponential price curve (dashed line). In fact, the logarithmic 
price curve results in cumulative revenue that continues to grow significantly, 
especially in later weeks. This is in contrast to the cumulative revenue gen¬ 
erated by the exponential price curve which becomes almost constant after 
week two or three. 

What do these findings imply? Recall that all three HSX price curves 
start and end at the same value (0 and 100, resp.), so all observed differ¬ 
ences are only with respect to their shape. This suggests that shapes matter 
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Fig. 7. Dependence plots for different input shapes. The left panels contain various ide¬ 
alized input curves of HSX prices over time. Each figure plots three possible shapes for the 
observed HSX trading history of a movie. The right panels plot the corresponding predicted 
cumulative revenues using FRAME. For example, in the top row we observe that an HSX 
trading curve which increases rapidly and then levels off (dotted line) corresponds to a 
higher predicted revenue than either a linear pattern (solid line) or slow start with a large 
increase at the end (dashed line). 


enormously in VSMs. It also suggests that more buzz early on (i.e., the log¬ 
arithmic shape) has much more impact on the overall revenue compared to 
a last moment hype closer to release time (i.e., the exponential shape). 

The next two rows of Figure 7 show additional shape scenarios with both 
rows displaying input curves with a common linear shape. In the second 
row the curves are converging toward a common HSX value, while the input 
curves in the third row are diverging. The case of diverging curves suggests 
that the larger the most recent HSX value, the larger is the corresponding 
cumulative box office revenue. The converging case emphasizes the effect 
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of recency of information: like in panel 1, all HSX price curves end at the 
same value; however, unlike in panel 1, they all have the same shape. We 
can see that the corresponding cumulative box office revenue also almost 
converges in week 5. This suggests that the difference in shape (e.g., linear 
vs. logarithmic vs. exponential) carries important information about the 
change in the dynamics of word of mouth or consumer-generated buzz which 
translates into significant revenue differences. 

The last row in Figure 7 shows yet another scenario of HSX price curves: 
an S-shape (dashed line) and an inverse-S shape (dotted line). Notice that 
the inverse-S shape features spurts of extreme growth both at the very be¬ 
ginning and at the very end, almost like a combination of logarithmic and 
exponential growth from panel 1. However, while the spurts resemble the 
logarithmic and exponential shapes, their overall magnitude is smaller com¬ 
pared to that in panel 1. As a result, the cumulative revenue is smaller 
compared to that of the linear growth. This suggests that while the dynam¬ 
ics of HSX price curves matter, their magnitude and timing matters even 
more, as the linear HSX price curve features a much more steady and sus¬ 
tained overall change in HSX prices compared to the inverse-S shape (which 
is constant most of the time with two small spurts at the beginning and the 
end). More evidence for this can be seen in the S-shaped HSX price curve 
(dashed line): while it does feature some change, most of the change happens 
in the middle of the curve which leads to the lowest of the three cumulative 
revenue curves. 

6. Conclusion. This paper makes three significant contributions. First, 
we develop a new nonlinear regression approach, FRAME, which is capable 
of forming predictions on a functional response given multiple functional 
predictors and simultaneously conducting variable selection. Our results on 
both the HSX and simulated data demonstrate that FRAME is capable of 
providing a considerable improvement in prediction and variable selection 
accuracy relative to a host of competing methods. Second, we introduce a 
new and promising data source to the statistics community. Online virtual 
stock markets (VSMs) are market-driven mechanisms to capture opinions 
and valuations of large crowds in a single number. Our work shows that the 
information captured in VSMs is rich but requires appropriate and creative 
statistical methods to extract all available knowledge [Jank and Shmueli 
(2006)]. Finally, we make our approach practical for inference purposes by 
developing dependence plots to illustrate the relationship between input and 
output curves. 

FRAME overcomes some of the technical difficulties encountered in other 
functional models. Eor instance, FRAME does not require the calculation 
of eigenfunctions, as is the case with our benchmark method, FPCA, in, for 
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example, Tables 1 or 3. In FPCA, we first compute the principal compo¬ 
nents of the response curves, and then apply standard modeling techniques 
to the principal component scores. However, since the response curves are 
observed with random error, so are the corresponding eigenfunctions. While 
approaches for removing this random variation from the eigenfunctions exist 
[Yao, Muller and Wang (2005b)], FRAME does not rely on a principal com¬ 
ponent decomposition and thus does not encounter this type of challenge. 

Our results have important implications for managerial practice. Equipped 
with the early forecasts of demand decay patterns, studio executives can 
make educated decisions regarding weekly advertising allocations (both be¬ 
fore and after the opening weekend), selection of the optimal release date 
to minimize competition with films from other studios and cannibalization 
of films from the same studio [Einav (2007)], and negotiation of the weekly 
revenue sharing percentages with the theater owners. Studios may be able 
to better manage distributional intensity and consumer word of mouth. Eor 
instance, for a movie predicted to have a strong opening weekend but fast 
decay afterward, the studio may consider nationwide release, as opposed 
to limited or platform release strategies (i.e., from initial limited release to 
nationwide release later on), at the same time strategically managing poten¬ 
tially negative word of mouth. The predicted demand decay of a film will also 
shed crucial light on a studio’s sequential distributional strategies. Eor exam¬ 
ple, a studio may consider delaying (or shortening) a movie’s video release 
or international release timing if the movie is predicted to have longevity (or 
faster decay) in theaters. Given that many academics have called for serious 
research on the optimal release timing in the subsequent distributional chan¬ 
nels, such as home videos and international theatrical markets [Eliashberg, 
Elberse and Leenders (2006)], and that these channels represent hve times 
more revenues than the domestic theatrical box office [MPAA (2007)], our 
results bear further crucial implications to the profitability of the motion 
picture industry. 

A potential limitation of our approach is that it may only add value in 
inefficient markets where valuable information, above and beyond the in¬ 
formation contained in the final trading price, is captured by the shape of 
the trading histories, such as prices, accounts and shares. However, as out¬ 
lined earlier, previous research suggests that VSMs are not fully efficient. 
Eurthermore, the strong predictive accuracy of our functional approach pro¬ 
vides further empirical validation for this finding. In addition, the FRAME 
methodology is applicable beyond just VSM data. In general, it can be used 
on any regression problem involving functional predictors and responses. 

We believe there are many other interesting applications of VSM’s to 
different domains, such as music, TV shows and video games which all 
share similar characteristics to movies, such as frequent introductions of new, 
unique and experiential products, pop culture appeal and strong influence 
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of hype on demand. Such research would be made possible by the increas¬ 
ing availability of data from VSMs for, for example, books (MediaPredict), 
music (HSX), TV shows (Inkling) and video games (SimExchange). 

APPENDIX: ALGORITHM DETAILS 

Eor a general penalty function, p{t), we use the local linear approximation 
method proposed in Zou and Li (2008) to solve (10). The penalty function 
can be approximated as p(||f||) ~/9'(||f*||)||f|| + C, where f* is some vector 
that is close to f and C is a constant. Hence, the only required change to 
the FRAME algorithm for optimizing over general penalty functions is to 
replace A by A* = A/9'(||fj||) in the calculation of Cj in 2, and replace A by 
A* = Ap'(||0fc||) in the calculation of Ck in 5, where ij and (p). represent the 
most recent estimates for ij and cpj^. The initial estimates of ij and can be 
obtained by using the Lasso penalty. This simple approximation allows the 
FRAME algorithm to be easily applied to a wide range of penalty functions. 

To implement the second step of the FRAME algorithm, we minimize 
(11) with respect to the ^ 7 j’s. Directly minimizing (11) is difficult due to the 
nonlinearity of the functions gj{x) ~ h(x)^^j. To overcome this difficulty, we 
observe that, with the estimates and aj. from Algorithm 1 and the current 
value, of rij, the first order approximation of g(0j-irij) ^ h.{6j-j^rij)'^^j 

is 

■ ^IjiiVj - Vj,oid)- 

Thus, we can approximate (11) by 

n rii / p 

■ ^IjiiVj - Vj,oid) 

i=l 1=1 V j=l 

where Ru = Yu - Yfj=i H^Iji'nj,oidVij “ ELi The above approxi¬ 

mation (14) is a quadratic function of rjj and can be minimized easily. Hence, 
the new value of r/j is updated as the minimizer of (14). We also note that 
if the estimate from Algorithm 1 is 0 , then the corresponding value of rjj 
will not be updated. 
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